Linguistically Annotated Corpora: Quality Assurance, Reusability and Sustainability
ثبت نشده
چکیده
منابع مشابه
Resources for Processing Hebrew
We describe work in progress whose main objective is to create a collection of resources and tools for processing Hebrew. These resources include corpora of written texts, some of them annotated in various degrees of detail; tools for collecting, expanding and maintaining corpora; tools for annotation; lexicons, both monolingual and bilingual; a rule-based, linguistically motivated morphologica...
متن کاملStandardisation Efforts On The Level Of Dialogue Act In The MATE Project
This paper describes the state of the art of coding schemes for dialogue acts and the efforts to establish a standard in this field. We present a review and comparison of currently available schemes and outline the comparison problems we had due to domain , task, and language dependencies of schemes. We discuss solution strategies which have in mind the reusability of corpora. Reusability is a ...
متن کاملLinguistically Annotated Learner Corpora: Aspects of a Layered Linguistic Encoding and Standardized Representation
Linguistically annotated corpora that are stored in standardized digital form can be a valuable source of empirical insight. They can help verify linguistic generalizations and support the formulation of new hypotheses. The linguistic annotation of such corpora often is crucial for their effective exploration from a linguistic perspective. The annotation essentially serves as an index to the li...
متن کاملAn Annotated Corpus Management Tool: ChaKi
Large scale annotated corpora are very important not only in linguistic research but also in practical natural language processing tasks since a number of practical tools such as Part-of-speech (POS) taggers and syntactic parsers are now corpus-based or machine learningbased systems which require some amount of accurately annotated corpora. This article presents an annotated corpus management t...
متن کاملAddressing the Resource Bottleneck to Create Large-Scale Annotated Texts
Large-scale linguistically annotated resources have become available in recent years. This is partly due to sophisticated automatic and semiautomatic approaches that work well on specific tasks such as part-ofspeech tagging. For more complex linguistic phenomena like anaphora resolution there are no tools that result in high-quality annotations without massive user intervention. Annotated corpo...
متن کامل